Cairo Governorate
- Europe > Switzerland > Zürich > Zürich (0.04)
- North America > United States > Texas (0.04)
- Europe > Norway > Western Norway > Vestland > Bergen (0.04)
- (2 more...)
- Leisure & Entertainment > Games (1.00)
- Transportation > Ground > Rail (0.45)
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.15)
- Europe > Germany > Brandenburg > Potsdam (0.04)
- Asia > China > Shanghai > Shanghai (0.04)
- (9 more...)
- Workflow (1.00)
- Research Report > New Finding (0.46)
- Law (1.00)
- Government (0.93)
- Information Technology (0.69)
Language Model Tokenizers Introduce Unfairness Between Languages
Recent language models have shown impressive multilingual performance, even when not explicitly trained for it. Despite this, there are concerns about the quality of their outputs across different languages. In this paper, we show how disparity in the treatment of different languages arises at the tokenization stage, well before a model is even invoked. The same text translated into different languages can have drastically different tok-enization lengths, with differences up to 15 times in some cases. These disparities persist even for tokenizers that are intentionally trained for multilingual support.
- North America > Haiti (0.14)
- Asia > Philippines > Luzon > Ilocos Region > Province of Pangasinan (0.04)
- Europe > Switzerland > Zürich > Zürich (0.04)
- (38 more...)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.70)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.69)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
- Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.68)
4 times drinking coffee was illegal--or even punishable by death
Rulers once closed cafés, burned beans, and even executed someone--all for a cup of coffee. A photograph taken in the 1920s shows a group of men gather at a small roadside coffee stall in Cairo, Egypt. Breakthroughs, discoveries, and DIY tips sent six days a week. Bach wrote a cantata about it . Scholars, philosophers, and lawyers have argued over it.
- Africa > Middle East > Egypt > Cairo Governorate > Cairo (0.25)
- Europe > Middle East > Republic of Türkiye > Istanbul Province > Istanbul (0.07)
- Asia > Middle East > Republic of Türkiye > Istanbul Province > Istanbul (0.07)
- (7 more...)
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.04)
- North America > United States > Illinois (0.04)
- North America > United States > California (0.04)
- (5 more...)
- Media > News (1.00)
- Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.72)
- Government > Regional Government > North America Government > United States Government (0.70)
- Transportation > Ground > Road (0.47)
A variational Bayes latent class approach for EHR-based patient phenotyping in R
Buckley, Brian, O'Hagan, Adrian, Galligan, Marie
As regulatory agencies increasingly recognise real-world evidence as a complement to traditional clinical trial data, interest has grown in applying Bayesian methods across both interventional and observational research (Boulanger and Carlin (2021). A central objective in many clinical investigations is the delineation of patient subgroups that exhibit comparable disease-related characteristics (He, Belouali, Patricoski, Lehmann, Ball, Anagnostou, Kreimeyer, and Botsis (2023)). Electronic Health Records (EHR) have become an important resource for such phenotypic analyses (Hripcsak and Albers (2013)). Bayesian approaches to patient phenotyping in clinical observational studies have been limited by the computational challenges associated with applying the Markov Chain Monte Carlo (MCMC) approach to real-world data. Hubbard, Huang, Harton, Oganisian, Choi, Utidjian, Eneli, Bailey, and Chen (2019) proposed a Bayes latent class model that could be used in a general context for observational studies that use EHR data. They consider the common clinical context where gold-standard phenotype information, such as genetic and laboratory data, is not fully available. A general model of this form has high potential applicability for use in clinical decision support across disease areas for both primary and secondary clinical databases. Latent Class Analysis (LCA) is widely used when we want to identify patient phenotypes or subgroups given multivariate data (Lanza and Rhoades (2013)). A challenge in clinical LCA is the prevalence of mixed data, where we may have combinations of continuous, nominal, ordinal and count data.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Africa > Middle East > Egypt > Cairo Governorate > Cairo (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.86)
Roman generals gifted kittens and piglets to their pet monkeys
The macaques were status symbols all the way from India. Breakthroughs, discoveries, and DIY tips sent every weekday. Elites in Ancient Rome went to great lengths to advertise their status and wealth. Based on recent archaeological excavations in Egypt, at least some high-ranking military officials even showed off with their choice of pets. In the, researchers at Poland's University of Warsaw described a nearly 2,000-year-old animal cemetery in the Egyptian port city of Berenike that includes the remains of multiple macaque monkeys .
- Europe > Poland > Masovia Province > Warsaw (0.25)
- Asia > India (0.25)
- Europe > Switzerland (0.05)
- Africa > Middle East > Egypt > Cairo Governorate > Cairo (0.05)
A Hybrid Model for Stock Market Forecasting: Integrating News Sentiment and Time Series Data with Graph Neural Networks
Sadek, Nader, Moawad, Mirette, Naguib, Christina, Elzahaby, Mariam
Stock market prediction is a long-standing challenge in finance, as accurate forecasts support informed investment decisions. Traditional models rely mainly on historical prices, but recent work shows that financial news can provide useful external signals. This paper investigates a multimodal approach that integrates companies' news articles with their historical stock data to improve prediction performance. We compare a Graph Neural Network (GNN) model with a baseline LSTM model. Historical data for each company is encoded using an LSTM, while news titles are embedded with a language model. These embeddings form nodes in a heterogeneous graph, and GraphSAGE is used to capture interactions between articles, companies, and industries. We evaluate two targets: a binary direction-of-change label and a significance-based label. Experiments on the US equities and Bloomberg datasets show that the GNN outperforms the LSTM baseline, achieving 53% accuracy on the first target and a 4% precision gain on the second. Results also indicate that companies with more associated news yield higher prediction accuracy. Moreover, headlines contain stronger predictive signals than full articles, suggesting that concise news summaries play an important role in short-term market reactions.
- Africa > Middle East > Egypt > Cairo Governorate > Cairo (0.05)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > United States > District of Columbia > Washington (0.04)
- (2 more...)
- Research Report > New Finding (1.00)
- Overview (1.00)
LLM Output Homogenization is Task Dependent
Jain, Shomik, Lanchantin, Jack, Nickel, Maximilian, Ullrich, Karen, Wilson, Ashia, Watson-Daniels, Jamelle
A large language model can be less helpful if it exhibits output response homogenization. But whether two responses are considered homogeneous, and whether such homogenization is problematic, both depend on the task category. For instance, in objective math tasks, we often expect no variation in the final answer but anticipate variation in the problem-solving strategy. Whereas, for creative writing tasks, we may expect variation in key narrative components (e.g. plot, genre, setting, etc), beyond the vocabulary or embedding diversity produced by temperature-sampling. Previous work addressing output homogenization often fails to conceptualize diversity in a task-dependent way. We address this gap in the literature directly by making the following contributions. (1) We present a task taxonomy comprised of eight task categories that each have distinct concepts of output homogenization. (2) We introduce task-anchored functional diversity to better evaluate output homogenization. (3) We propose a task-anchored sampling technique that increases functional diversity for task categories where homogenization is undesired, while preserving it where it is desired. (4) We challenge the perceived existence of a diversity-quality trade-off by increasing functional diversity while maintaining response quality. Overall, we demonstrate how task dependence improves the evaluation and mitigation of output homogenization.
- Europe > United Kingdom (0.14)
- Asia > Pakistan (0.04)
- North America > United States > New York (0.04)
- (7 more...)
- Information Technology (0.92)
- Law Enforcement & Public Safety (0.67)
Formal that "Floats" High: Formal Verification of Floating Point Arithmetic
Mohanty, Hansa, Viswambharan, Vaisakh Naduvodi, Gadde, Deepak Narayan
Formal verification of floating-point arithmetic remains challenging due to non-linear arithmetic behavior and the tight coupling between control and datapath logic. Existing approaches often rely on high-level C models for equivalence checking against Register Transfer Level (RTL) designs, but this introduces abstraction gaps, translation overhead, and limits scalability at the RTL level. To address these challenges, this paper presents a scalable methodology for verifying floating-point arithmetic using direct RTL-to-RTL model checking against a golden reference model. The approach adopts a divide-and conquer strategy that decomposes verification into modular stages, each captured by helper assertions and lemmas that collectively prove a main correctness theorem. Counterexample (CEX)-guided refinement is used to iteratively localize and resolve implementation defects, while targeted fault injection validates the robustness of the verification process against precision-critical datapath errors. To assess scalability and practicality, the methodology is extended with agentic AI-based formal property generation, integrating large language model (LLM)-driven automation with Human-in-the-Loop (HITL) refinement. Coverage analysis evaluates the effectiveness of the approach by comparing handwritten and AI-generated properties in both RTL-to-RTL model checking and standalone RTL verification settings. Results show that direct RTL-to-RTL model checking achieves higher coverage efficiency and requires fewer assertions than standalone verification, especially when combined with AI-generated properties refined through HITL guidance.